Entropic Graph-based Posterior Regularization
نویسندگان
چکیده
Graph smoothness objectives have achieved great success in semi-supervised learning but have not yet been applied extensively to unsupervised generative models. We define a new class of entropic graph-based posterior regularizers that augment a probabilistic model by encouraging pairs of nearby variables in a regularization graph to have similar posterior distributions. We present a three-way alternating optimization algorithm with closed-form updates for performing inference on this joint model and learning its parameters. This method admits updates linear in the degree of the regularization graph, exhibits monotone convergence, and is easily parallelizable. We are motivated by applications in computational biology in which temporal models such as hidden Markov models are used to learn a human-interpretable representation of genomic data. On a synthetic problem, we show that our method outperforms existing methods for graph-based regularization and a comparable strategy for incorporating long-range interactions using existing methods for approximate inference. Using genome-scale functional genomics data, we integrate genome 3D interaction data into existing models for genome annotation and demonstrate significant improvements in predicting genomic activity.1 Due to space constraints, this manuscript omits some proofs and experiments, as noted below. Please refer to the extended version (Libbrecht et al., 2015) for these sections. Proceedings of the 32 International Conference on Machine Learning, Lille, France, 2015. JMLR: W&CP volume 37. Copyright 2015 by the author(s). Graph-based methods have recently been successful in solving many types of semi-supervised learning problems (Chapelle et al., 2006; Das & Smith, 2011; Joachims, 1999; Subramanya et al., 2010; ?; Subramanya & Bilmes, 2011; Zhu et al., 2004; Zhu & Ghahramani, 2002). These methods assume that data instances lie in a low-dimensional manifold that may be represented as a graph. They optimize a graph smoothness criterion, which states that data instances nearby in the graph should be more likely to receive the same label. In a semi-supervised learning setting, optimizing this criterion has the effect of spreading labels from labeled to unlabeled instances. Despite the success of graph-based methods for semisupervised learning, there has not been as much study of the use of graph smoothness objectives in an unsupervised setting. In unsupervised problems, we do not have labels but instead have a generative model that is assumed to explain the observed data given the latent labels. While some types of relationships between instances (for example, the relationship between neighboring words in a sentence or neighboring bases in a genome) can easily be incorporated into the generative model, it is often inappropriate to encode a graph smoothness assumption into the model this way, for two reasons. First, in some cases, it is not clear what probabilistic process generated the labels with respect to the graph. Some objectives and distance measures that are successful for semi-supervised learning do not have probabilistic analogues. Second, large models must obey factorization properties (e.g., a tree or chain as in hidden Markov models) to facilitate the use of efficient dynamic programming algorithms such as belief propagation. Graphs representing similarity between variables do not in general satisfy these structure requirements because they tend to be densely clustered, leading to very high-order factors. Entropic Graph-based Posterior Regularization In this paper, therefore, we propose a new regularization approach for expressing a graph smoothness objective over a probabilistic model. We employ the posterior regularization (PR) framework of Ganchev et al. (2010), in which a probabilistic model is regularized through a term defined on an auxiliary posterior distribution variable. We define a powerful posterior regularizer which encourages pairs of variables to have similar posterior distributions by adding a penalty based on their Kullback-Leibler (KL) divergence. The pairs of penalized variables are encoded in a regularization graph which may be entirely different from the graphical model on which inference is performed. This regularizer graph need not have low treewidth and admits efficient optimization even when fully connected. We call our strategy of adding KL regularization penalties entropic graph-based posterior regularization (EGPR). We show that inference and learning using this regularizer can be performed efficiently using a three-way alternating optimization algorithm with closed-form updates. This algorithm alternates between (1) smoothing marginal posteriors according to a regularization similarity graph, (2) performing probabilistic inference in a graphical model with the same dependence structure as the unregularized model, and (3) updating model parameters. The updates are linear in the degree of the regularization graph and are easily parallelizable (?), in our experiments scaling to tens of millions of variables. We show that this procedure corresponds to a generalization of the EM algorithm. We apply this approach to improve existing methods for annotating the human genome (Day et al., 2007; Hoffman et al., 2012a; Ernst & Kellis, 2010). Methods for genome annotation distill genomic data into a human-interpretable form by simultaneously partitioning the genome into nonoverlapping segments and assigning labels to each segment. This type of analysis has recently had great success in interpreting the function of the human genome and formed an integral part of the analysis of the NIH-sponsored ENCODE project ((ENCODE Project Consortium, 2012; Hoffman et al., 2012b), http://www.nature.com/encode). However, exiting annotation methods use temporal models such as hidden Markov models and therefore cannot efficiently incorporate data on the genome’s 3D structure. This 3D structure has been shown to play a key role in gene regulation and other genomic processes. In our experiments on synthetic data, a model using EGPR outperforms comparable models using either other regularization strategies (e.g., squared error) or loopy belief propagation. On ENCODE data, a model using EGPR predicts genome activity much more accurately than the currently-used chain models as well as other forms of regularizer. Thus EGPR provides a method for jointly modeling genome activity and 3D structure. 1. Proposed Method In an unsupervised learning problem, we are given a set of vertices V that index a set of n = |V | random variables XV = {X1, . . . , Xn} and a conditional dependence graph G = (V,E). The graphical model describes a probability distribution parameterized by θ that can be factorized as pθ(xV ) = 1 Z ∏ C∈C φ (C) θ (xC) where each C ⊆ V is a fully connected clique in G. We denote random variables with capital letters (e.g., XH ) and instantiations of variables with lower-case (e.g., xH ∈ domain(XH)). We also use capitals to denote sets and lowercase to denote set elements (e.g., Xh for h ∈ H). Training graphical models involves a set of observed data x̄O, where a subset of variables O ⊆ V is observed and the remainder H = V \O are hidden. When the probability distribution is governed by a set of parameters θ, penalized maximum likelihood training corresponds to the optimization maximizeθ J(θ) , L(θ) +R(θ) (1) where L(θ) , log pθ(x̄O) = log ∑ xH pθ(xH , x̄O), (2) and where R(θ) is a regularizer that expresses prior knowledge about the parameters. Many regularizers are used in practice, such as the `2 or `1 norms, which encourage parameters to be small or sparse, respectively. Instead of placing a regularizer on the parameters themselves, it is often more natural to place a regularizer on the posterior distribution, a technique called posterior regularization (Ganchev et al., 2010). This is done by introducing an auxiliary joint distribution q(XH), placing a regularizer on q(XH), and encouraging q to be similar to pθ via a KL divergence penalty. The regularizer is RPR(θ) , max q RPR(θ, q) (3) RPR(θ, q) , −D(q(XH)‖pθ(XH |x̄O)) + PR(q), (4) where D(·‖·) is the KL divergence D(p(X)‖q(X)) = ∑ x p(x) log(p(x)/q(x)) and PR(q) is a penalty term that expresses some prior knowledge about the posterior distribution. For notational convenience, we also define J ′(θ, q) , L(θ) +R′(θ, q). Ganchev et al. (2010) showed how to optimize this combined objective efficiently when PR(q) is a sum of terms over individual cliques in the model. Such regularizers can be used for constraining the posterior of individual variables in expectation, among other applications. However, graph smoothness objectives cannot be expressed this way, because they involve arbitrary pairs of variables. When we have a graph smoothness assumption, we are given a weighted, undirected regularization graph over the hidden variables GR = (H,ER), where ER ⊆ H ×H is a set of edges with non-negative similarity weights w : ER → R+, Entropic Graph-based Posterior Regularization such that a large w(u, v) indicates that we have strong belief that Xu and Xv should be similar. The regularization graph GR is entirely separate from the conditional dependence graphG and, in particular, need not obey any decomposition or factorization properties to admit efficient inference. He et al. (2013) introduced a regularizer of the following form. Let λG be a hyperparameter controlling the strength of regularization. The regularizer is
منابع مشابه
Entropic Graph-based Posterior Regularization: Extended Version
Graph smoothness objectives have achieved great success in semi-supervised learning but have not yet been applied extensively to unsupervised generative models. We define a new class of entropic graph-based posterior regularizers that augment a probabilistic model by encouraging pairs of nearby variables in a regularization graph to have similar posterior distributions. We present a three-way a...
متن کاملSemi-Supervised Phone Classification using Deep Neural Networks and Stochastic Graph-Based Entropic Regularization
We describe a graph-based semi-supervised learning framework in the context of deep neural networks that uses a graph-based entropic regularizer to favor smooth solutions over a graph induced by the data. The main contribution of this work is a computationally efficient, stochastic graph-regularization technique that uses mini-batches that are consistent with the graph structure, but also provi...
متن کاملQuadratically-Regularized Optimal Transport on Graphs
Optimal transportation provides a means of lifting distances between points on a geometric domain to distances between signals over the domain, expressed as probability distributions. On a graph, transportation problems can be used to express challenging tasks involving matching supply to demand with minimal shipment expense; in discrete language, these become minimumcost network ow problems. R...
متن کاملEntropic Graph Regularization in Non-Parametric Semi-Supervised Classification
We prove certain theoretical properties of a graph-regularized transductive learning objective that is based on minimizing a Kullback-Leibler divergence based loss. These include showing that the iterative alternating minimization procedure used to minimize the objective converges to the correct solution and deriving a test for convergence. We also propose a graph node ordering algorithm that i...
متن کاملA Topsis-based Entropic Regularization Approach for Solving Fuzzy Multi-objective Nonlinear Programming Problems
In this work, a version of the technique for order preference by similarity ideal solution (TOPSIS) with entropic regularization approach is developed for solving the fuzzy multi-objective nonlinear programming (MONLP) problems. Applying the basic principle of compromise of TOPSIS, the fuzzy MONLP problem can be reduced into a fuzzy bi-objective nonlinear programming problem. Moreover, followin...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015